Optimizing DDA Code on a POWER5 Processor

نویسنده

Adam Jundt

چکیده

In this paper we take an existing scientific computation code, DDA, and optimize it to run on an IBM Power5 processor. The DDA code, originally developed by a Ph.D. candidate in physics, suffers from excessive execution time caused by a high number of cache accesses and a low rate of instructions per cycle. Our goal is to improve the code’s performance by making a series of optimizations in a step-by-step manner. The first and second stages of optimizations were done by selecting specific optimization parameters available from IBM’s compiler, xlC. Our next step was to perform handmade optimizations to the code, concentrating mainly on loop fusion techniques. Our last stage of optimization was to incorporate OpenMP into the code in order to take advantage of the dual-cores available on the Power5 system. By using the IBM High Performance Toolkit, we were able to record the change in number of L1 data cache misses and references, IPC, and execution time after each phase of optimization. Using the original source code with no optimizations as the base for our experiments, we were able to obtain a speedup of 12x for “compiler only” optimizations, and an overall speedup of 42x after all modifications were made.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Large Page and Processor Binding to Optimize the Performance of OpenMP Scientific Applications on an IBM POWER5+ System

Multicores are widely used for high performance computing and are being configured in a hierarchical manner to compose a multicore system. While this presents significant new opportunities, such as high inter-core bandwidth and low inter-core latency, it also presents new challenges in the form of inter-core resource conflict and contention. A challenge to be addressed is how well current share...

متن کامل

A Tale of Two Processors: Revisiting the RISC-CISC Debate

The contentious debates between RISC and CISC have died down, and a CISC ISA, the x86 continues to be popular. Nowadays, processors with CISC-ISAs translate the CISC instructions into RISC style micro-operations (eg: uops of Intel and ROPS of AMD). The use of the uops (or ROPS) allows the use of RISC-style execution cores, and use of various micro-architectural techniques that can be easily imp...

متن کامل

Advanced virtualization capabilities of POWER5 systems

IBM POWER5e systems combine enhancements in the IBM PowerPCe processor architecture with greatly enhanced firmware to significantly increase the virtualization capabilities of IBM POWERe servers. The POWER hypervisor, the basis of the IBM Virtualization Enginee technologies on POWER5 systems, delivers leading-edge mainframe virtualization technologies to the UNIXtmarketplace. In addition to bei...

متن کامل

A Study of the Influence of the POWER5 Dynamic Resource Balancing (DRB) on Optimal Hardware Thread Priorities

Simultaneous Multithreading, often abbreviated SMT, is a technique for improving the overall efficiency of superscalar processors with hardware multithreading. SMT permits a processor to concurrently execute multiple independent instruction streams every clock cycle, potentially improving processor throughput. However, this can introduce contention for shared resources amongst threads running c...

متن کامل

IBM power5 chip: a dual-core multithreaded processor - Micro, IEEE

IBM introduced Power4-based systems in 2001. The Power4 design integrates two processor cores on a single chip, a shared second-level cache, a directory for an off-chip third-level cache, and the necessary circuitry to connect it to other Power4 chips to form a system. The dual-processor chip provides natural thread-level parallelism at the chip level. Additionally, the Power4’s out-of-order ex...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Optimizing DDA Code on a POWER5 Processor

نویسنده

چکیده

منابع مشابه

Using Large Page and Processor Binding to Optimize the Performance of OpenMP Scientific Applications on an IBM POWER5+ System

A Tale of Two Processors: Revisiting the RISC-CISC Debate

Advanced virtualization capabilities of POWER5 systems

A Study of the Influence of the POWER5 Dynamic Resource Balancing (DRB) on Optimal Hardware Thread Priorities

IBM power5 chip: a dual-core multithreaded processor - Micro, IEEE

عنوان ژورنال:

اشتراک گذاری